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ABSTRACT 

The issues and some proposed solutions regarding 
Follow Through (FT) site variability are examined with a review / o£ 
developments in FT evaluation. The role of adjusted site means with 
differences within sponsors and between sponsors and background 
characteristics is discussed to determine whether adjusted means are 
the preferred measures of model effectiveness. In a Big City Group, 
attrition bias in data for non-FT and FT site analysis is considered. 
Improvements in measurement are shown in the sampling of content and 
behavior, including the. use of computer systems with broad content 
samples. These procedures can eliminate reliance on multiple choice 
questions and the use of classroom process data with student reports 
on opportunity to learn (OTL) data expanding the dimensions of 
variability. A comtemporary model which crosses class type with 
school sites illustrates the multilevel regression analysis. Student 
scores are the dependent variable; and class type, sex, OTL class 
mean and individual math ability are the independent variables. The 
significant role of OTL to the stepwise fitting of the model is 
shown. (CM) 
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Looking back at the Follow Through evaluation. from a J982 perspec- 
tive, one is struck by just how much things have changed in ten years. 
Thepre is much more respect *for contextual effects and the need to link 
achievement measures more closely to curriculum content. Contemporary 
approaches, therefore, are not just fancier, regression analyses but 
include mote complex designs, more exploration of the data and 'fancier v 
regression analyses. In this paper, some Follow Through evaluation | 
data will be revisited briefly, but one or two contemporary examples will 
serve better to illustrate contemporary approaches. The purpose is to 
discuss the issues and some proposed solutions rather than to argue the 
Follow Through site variability question one way or the other. 
Follow Through Revisited 
* Exploration . Raw score site means are displayed in Figure 1 for 

several outcome variables from the FT evaluation. These are continued " 
' in Figure 2, where background variables are added (again, site means). 
£) These figure? provide a graphic display of the within site, across s 
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*A paper presented as part of the symposium "The Site Variability 
Issue in Follow Through Revisited: Some Nek Data, Some New Methodologies 
and New Insights." AE^A annual meeting, New York, N.Y., March 19-23, 



1982. 
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sponsor distributions ;. , Examined in this form, it is clear that, as the 
Abt Ass'ociales evaluation said (Stebbins et al., 1977), n "the effective- 
ness of each FT no'del- varied substantially from site group to site 
group," It is also plausible, though not as consistently clear, that 
"overall model averages varied ^.little in comparison/ 1 4 Mod$l differences 
appear to be larger on math computations than reading, but the range of 
site means is large in both cases. 

What is also clear is that the sites* within each sponsor varied 
substantially on background characteristics (Ethnic-linguistic, SES, 

WRAT) as well'as in x "effectiveness 11 (scores on the MAT subtests). It v 

* 

does not seem, however, that the average and range of site background 
characteristics differed greatly from sponsor to sponsor. 

Looking just at the Follow Through groups, the smallest' range of 
means on reading is 6 points (Behav. Anal.), a grade equivalent range 
of about 11 months. The largest (Resp. Educ.) is 10 poin'ts. Sponspr 
means range from 16 to 18. The exploration has yet to shake the - 
''plausibility o£ the Abt finding ^on site variability. 

"* Confirmation » Bereiter and Kurland (1978 and in press) took a 
sensible tack (straightforward and conventional, Bereiter and Kurland, 
1978, p. 3) and adjusted site means for background characteristics. 
Insofar as achievement is jelated to SES and the like, some of the <r 
variance we see can be attributed to background. Ethnic-lingui§tic and 
SES measures are correlated with achievement, so one expects covariance 
analysis to affect the results— and it does. Differences among sponsor 

means that were not .statistically significant before become significant. 

» 

c 
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A previous exploratory observation was confirmed by the covariance 
analysis When the differences among adjusted means were seen to be 
virtually identical to those among unadjusted means. Background differ- 
ences were similar from sponsor to sponsor. The covariance adjustment 
(shown by Bereiter and Kurland to be robust) has reduced the error 
variance and allowed us to infer dwith some. confidence that the differ- 
ences we were observing between and among sponsor means are not likely 
"\ sampling fluctuations or other statistical artifacts . 

But what of site variability? Estimates of between-sponsor differ- 
ences were unchanged, but the estimates of within-sponsor variability 
(site. variability) were reduced. Now, overall (adjusted) model averages 
vary more in comparison to the variation in adjusted means from site to 
Site. What remains to ask is whether the adjusted means are the pre- 
ferred measures of model "effectiveness. 11 " ' ^ . 

That this may not be completely straightforward was argued by 
Cronbach, Rogosa, Flo'den and Price* (1977), and it doesn ! t seem completely 
straightforward to take the reduced variance estimate as proof that 
differences among models previously regarded as modest in context should 
now be regarded as important. No doubt Kurland will clarify the matter 
* in his paper (tfurland, 1982). Before considering other confirmatory 
analyses, consider one more contextual issue raised by exploratory 
analysis. 

The^Big City Group . Substantial attrition did occur over the 

3fc 2 * 

three years of the evaluation (Stebbins et al., 1977, p. 82), but AAI 
were persuaded that.no bias resulted. Pursuing the attrition matter, 
McLean (1978) plotted differential attrition (FT vs NFT) against 
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differential WRAT scores, with a result (Figure 2) that suggested an 
attrition bias acted against the nfo-Fpllow Through groups . (The^ 
trend from upper left to lower right is significant: r - -.51, p < .01.) 
Mojre low-sqpring students dropped out of the Follow Through than the 
non-Follow Througfi groups. * 

/J 

The bottom right-hand quadrant in Figure 2 contains sites for * ^ 
which the FT attrition exceeded the NFT and for which the FT WRAT scores 
were lower°than NFT. When these sites were identified, aU but two of 
the big city sites were there., afid-the other two were nearby. The 
sites in th e upper left-hand quadrant turned out to be smaller communis 
ties, suggesting a contextual effect that* had not been turned up in the 
omnibus analyses-: This type of analysis has been followed up by 

Gersten (1982) . . 

A purely site- level analysis cannot be refined to any extent (by 
grouping, for-^example) because the sample size is too small-e* Combining 
student-level and site-level data would be an attractive alternative, 
to be discussed in the last section. First, however, consider how con- 
tent and measuring techniques might affect site and model variability. 
Im provements in Measurement and in the Sampling of Content an d Behavior 

The narrow coverage of early childhood outcomes was criticized 
by House et al., (1978) and a number of sponsors felt keenly that the 
measures 1 selected for the evaluation were not valid indicators of the 
effectiveness of their programs. Certainly the multiple-choice format 
dictated by the technical and financial constraints placed on the~evalua- 
tion severely restricted the sample of student behavior obtained from 
these nine- year- olds (not to speak of the five- and six-year-olds) . 



In short, the observed variability was a drastically reduced sample of 
reality.. 

Since large item pools are available, that may be used with item 
sampling techniques, there is Ho longer any excuse for poor curriculum 
coverage ifc large, important study. With a total cost estimated at 
$30-$50 million (House et al., 1978, p. 129; 1977 dollars), the Follow 
Through evaluation certainly qualified as large and important. 

Modern computer systems have also removed the need to rely 

exclusively on multiple-choice questions in*large evaluations or assess- 

ments. As an example, the 19 % 81 Field Trials of the* Ontario Assessment 

i ' 
Instrument Pools in mathematics and English involved over 37,000 students 

in grades.. 7 to 10 in 180 schools, as well as 1Q00 English- and 600 mathe- 
matics instruments, most of which required a constructed- response. 

All responses were entered to computer 'files, * checked and readied 
for scoring in eight weeks, by .specially trained clerks using 'custom 
computer programs. Subsequent analysis steps were 'largely the? same as 
those that confronted the AAI staff, with two important elaborations. 
First, the content samplers broader and more finely stratified. The 
mathematics content included 55 terminal* objectives, for example, each 
of wljich was represented in the field trial by six examples. Sixteen 
topics (analogous to, subtests) were chosen for summaries (e.g., whole 
numbers, decimals, fractions, integers, algebra and the like— elementary 
and intermediate) . , 

The second elaboration was the inclusion of classroom process 
data, along with student reports on opportunity- to- learn (OTL) . These \ 
latter we^e suggested by association with the Second International 
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. Mathematics Study (SIMS), now almost complete in 23 countries,. In .SIMS, 

t 

both student and teacher- OTL reports were collected, along with elaborate 

reports on teacher math constructs and classroom procedures. An important 

practical result of these elaborations* is that the dimensions of variability 

expand exponentially, demanding new data analytic approaches, Tliere are ■ 

■any from which to choose, and this paper migh>t better have been entitled 
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"A few of the simpler contemporary- approaches . ». . 11 

I $ f Contemporary Example 

Three Class-types Crossed Kith Nine School Nested Within Four School Boards 
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Class 1 (13) 



Class 3 (27) 
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Class 1 (16) 
Class. 2 (31) 



Board 
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Class 1 (15) 
Class 2 (29) 
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S8 



Class 1 (11) 



Class 3 (14) 



S2. 



Class- 1 (13) 
Class. 2 (29) 



S5 



Class 1 (14) 



Class 3 (23) 



S7 



Class 1 (10) 
Class 3 (25) 



S9 



Class 2 (22) 
Class 3 (29) 



S3 

I 



Class 1 (11) 

<£ 

Class 3 (25) 



Class 1: Basic level, low achievement 
Class 2: General level, cross section 
Class* 3: Advanced level, high achievement 



^temporary . . . Belonging to same time or of same age, esp..*S oneself; 
(ultra) .odern in style or design (Oxford Pocket Dictionaryy-eth-ed. , 1978). 
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Variance Decomposition 

Boards , 4 . 112.7 1%' >< 

Schools/Bd 1394.2 13% 

Class Level 2867.5 , 26% 

Class- L x School 390.2 - 4% 

Within School 6062.7 56% 

Total "10827.3. 

Total N 357 

' • " 3 

Multilevel regression . Burstein advocates fitting regression 

models containing both aggregated and student-level data. Such a model 

for the data in the contemporary example might include: 

Dependent Variable; Total math score 

(student level) 

Independent Variables: 

; j ' 1. Class type Basic, General, Advanced 

(categorical) 

2. Sex Female, Male 

(categorical) 

3. Opportunity to Learn Scale: 0 to 20 

(class mean) 

4. Relative Math Ability Total Ma th - C lass_Mean 

(student lev£l) (student} Subtest^"" 

Prerequisites 



3 Burstein, Leigh. Explanatory models using between and within 
class regression: basic ~concepts_and an example. Paper presented at 
the data analysis workshop, Second International Mathematics Study, 
Toronto, Canada, December 7-^11, 1981. 
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The result of fitting such a £odel (stepwise) 
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Figure 4 is a scatterplot of OTL with math score (tlass means)*. 
It is interesting to observe that OTL is a powerful variable (pooled 

within class correlation with score is 0.5) over and above differences 

» 

among classes and schools. 

The lesson this author draws with- regard to Follow Through is 
that the issue of site variability probably cannot be adequately 
expjojred with the da^a as collected. We might best move on t6 other * 
tasks. ■ . • * G> 
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Fig 1* Plots of site means for. FT (solid circles) and NFT (open circles), 
data from Abt Associates Inc. IV-C r 1977, and J« f 1976, Means 
" . are of raw scores on the Metropolitan Achievement Tests (Elementary 
version) administered at the end of third grade. Data are included 
from cohorts II-X, H-EF, IH-K and III-EF. Horizontal lines 
indicate averages of site means.. 
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Pig. 3. Differential attrition (horlzonal axle) plotted against differential 
In WHAT aean scores (vertical axis) at sites for eight largest Follow 
Through sponsors. Differential « HFT - FT. Sites where the NFT had 
lower WHAT scores than FT had higher NFT attrition, (r > -.51) 
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Figure 4 ; "Scatterplot" showing strong relationship between student reoorts 
when material was taught and student achievement. 
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